Author: Marko Kangrga

Bond Screen using CapIQ Data

Attempt to run a simple linear regression on a population of bonds downloaded from Capital IQ to detect outliers and screen for relative value opportunities to identify where to focus a deeper dive work.

The Excel files contain ~70 columns, containing bond-specific descriptive fields, such as coupon, maturity, covenants, etc., as well as company fundamental metrics, such as Net Debt/EBITDA, EV/EBITDA, EBITDA margin, Debt/Capital ratio, etc. The files need to be combined in a Pandas dataframe, which will be cleaned up and used to run a multiple linear regression on the data, targeting YTW as an independent variable. This is just a proof of concept - eventually the independent variable should be either spread-to-worst or option-adjusted-spread.

In [1]:
import re, glob, os, time, datetime
from dateutil.parser import parse
import pandas as pd
pd.options.display.max_columns = None
from xlsxwriter.utility import xl_rowcol_to_cell, xl_cell_to_rowcol
import numpy as np
import scipy
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import blackscholes
%matplotlib inline

from sklearn import ensemble, neural_network, tree, linear_model, kernel_ridge, neighbors, svm, tree, naive_bayes
from sklearn import preprocessing, metrics, decomposition, preprocessing, model_selection, feature_selection
from sklearn.externals import joblib
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.base import TransformerMixin, BaseEstimator

from IPython.display import display, HTML, clear_output
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; font-family: Monaco; font-size: 10pt;}
    div#menubar-container     { width: 95%; font-family: Monaco; font-size: 10pt;}
    div#maintoolbar-container { width: 99%; font-family: Monaco; font-size: 10pt;}
    .CodeMirror{font-size: 8pt;}
</style>
"""))

import field_constants
matplotlib.rcParams['figure.figsize'] = (16.0, 8.0)

Dataset buildout

Run once to build pickle from Excel files downloaded from a broad CapiatalIQ Screen

In [2]:
# dataset_path = os.path.join('datasets', 'us_europe_ds', '20180430')
# files = [f for f in os.listdir(os.path.join(dataset_path, 'excel'))]
# df = pd.read_excel(os.path.join(dataset_path, 'excel', files[0]), skiprows=7)
# for file in files[1:]:
#     df = df.append(pd.read_excel(os.path.join(dataset_path, 'excel', file), skiprows=7))
# df.reset_index(inplace=True, drop=True)
# df.to_pickle(os.path.join(dataset_path, 'dataset.pkl'))

Load and preview the dataset

Load dataset from a pickled CapIQ screen

In [3]:
dataset_path = os.path.join('datasets', 'us_europe_ds', '20180430')
df = pd.read_pickle(os.path.join(dataset_path, 'dataset.pkl'))

# Map CapitalIQ field names to short, presentable names that can be more easily managed
numeric_fields, short_column_names, column_mapping,\
bond_id_columns,ratings_sorted, numeric_fields_mapped,\
group_categories, sorted_results_mapping, sorted_results_columns = field_constants.get_fields(df)
df.head(3)
Out[3]:
Maturity Date Issuer Exchange:Ticker Fixed Income Security Type Seniority Level Coupon Rate (%) Coupon Type Offering Date Offering Amount ($USDmm, Historical rate) Amount Outstanding ($USDmm, Today's rate) S&P Security Rating - Issue Credit Rating - Local Currency LT [Latest] (Rating) S&P Security Credit Rating Date - Issue Credit Rating - Local Currency LT [Latest] (Rating Date) Ultimate Parent Name Yield to Worst (%) [Latest] Duration [Latest] Net Debt/EBITDA [LTM] [Issuer] Net Debt/(EBITDA-CAPEX) [LTM] [Issuer] Next Call Date Next Call Price Industry Classifications [Issuer] Primary Sector [Issuer] Primary Sector [Ultimate Parent] No of Analysts (Latest) [Issuer] No of Analysts (Latest) [Ultimate Parent] % Price Change (%) [3 Months] Price [Latest] Headquarters - Country [Issuer] Headquarters - Country [Ultimate Parent] Amount Outstanding ($USDmm, Historical rate) S&P Security Rating - Issue Credit Rating - Foreign Currency LT [Latest] (Rating) S&P Security Credit Rating Date - Issue Credit Rating - Foreign Currency LT [Latest] (Rating Date) Excel Security ID Offering Price ($USD, Historical rate) Offering Yield (%) Principal Amount ($USD, Historical rate) Benchmark Spread (bps) Gross Spread (bps) Payment Frequency Evaluation Date Convexity [Latest] Conversion Security Conversion Price ($) Conversion Security Price ($) Conversion Premium Bondholder Protective Issuer Restrictive Subsidiary Restrictive EBITDA Margin % [LTM] [Issuer] EBITDA Margin % [LTM] [Ultimate Parent] Security Tickers [Issuer] Security Tickers [Ultimate Parent] Country of Incorporation [Issuer] Total Debt/Capital % [Latest Quarter] [Issuer] Total Debt/Capital % [Latest Quarter] [Ultimate Parent] EBITDA [LTM] ($USDmm, Historical rate) [Issuer] EBITDA [LTM] ($USDmm, Historical rate) [Ultimate Parent] Total Enterprise Value [My Setting] [Latest] ($USDmm, Historical rate) [Issuer] Total Enterprise Value [My Setting] [Latest] ($USDmm, Historical rate) [Ultimate Parent] Security Currency Primary Industry [Issuer] Est. Annual Revenue Growth - 2 Yr % - Capital IQ [Latest] [Issuer] Est. Annual Revenue Growth - 2 Yr % - Capital IQ [Latest] [Ultimate Parent] EBITDA - Capital IQ [NTM] ($USDmm, Historical rate) [Issuer] EBITDA - Capital IQ [NTM] ($USDmm, Historical rate) [Ultimate Parent] Est. Annual EBITDA Growth - 2 Yr % - Capital IQ [Latest] [Issuer] Est. Annual EBITDA Growth - 2 Yr % - Capital IQ [Latest] [Ultimate Parent] EBITDA, 1 Yr Growth % [LTM] (%) [Issuer] EBITDA, 1 Yr Growth % [LTM] (%) [Ultimate Parent] Net Debt [Latest Quarter] ($USDmm, Historical rate) [Issuer] Net Debt [Latest Quarter] ($USDmm, Historical rate) [Ultimate Parent] Excel Company ID [Ultimate Parent] Geographic Locations [Issuer] Geographic Locations [Ultimate Parent]
0 Perpetual Technicolor SA (ENXTPA:TCH) ENXTPA:TCH Corporate Debentures Subordinate 0 Variable 2005-09-26 00:00:00 601.7 605.25 - - Technicolor SA (ENXTPA:TCH) 8.995 61.414 3.39 4.38 - - Consumer Discretionary (Primary); Entertainmen... Consumer Discretionary Consumer Discretionary 10 10 0 0.450 France France 605.25 - - IQT31867668 99.3 0 1210.5 3.63 - - 2018-04-27 - - - - - - - - 5.41 5.41 DB:TNM1; ENXTPA:TCH; LSE:0MV8; OTCPK:TCLR.Y; O... DB:TNM1; ENXTPA:TCH; LSE:0MV8; OTCPK:TCLR.Y; O... France 76.6 76.6 275 275 1565.2 1565.2 Euro Movies and Entertainment -2.16 -2.16 334.9 334.9 3.02 3.02 -26.6 -26.6 931.8 931.8 IQ93471 Europe (Primary) Europe (Primary)
1 Perpetual UniCredit S.p.A. (BIT:UCG) BIT:UCG Corporate Convertible Subordinate 5.375 Variable 2017-12-20 00:00:00 1187.9 1210.51 - - UniCredit S.p.A. (BIT:UCG) 0.002 60.682 - - 2025-06-03 00:00:00 100 Banks (Primary); Banks (Primary); Diversified ... Financials Financials 38 38 -4.99 99.875 Italy Italy 1210.51 - - IQT547028368 100 - 242101 4.93 - Semi-Annual 2018-04-27 - - - - - - - - - - BIT:UCG; BMV:UCG N; DB:CRIN; LSE:0RLS; LSE:0RL... BIT:UCG; BMV:UCG N; DB:CRIN; LSE:0RLS; LSE:0RL... Italy - - - - - - Euro Diversified Banks 1.99 1.99 - - - - - - 47673.5 47673.5 IQ695938 Europe (Primary) Europe (Primary)
2 Perpetual Compagnie de Saint-Gobain S.A. (ENXTPA:SGO) ENXTPA:SGO Corporate Debentures Senior Unsecured 0.063 Variable 1985-04-24 00:00:00 89.2 151.31 - - Compagnie de Saint-Gobain S.A. (ENXTPA:SGO) 1.210 50.202 1.45 2.29 - - Abrasive and Asbestos Products (Primary); Buil... Industrials Industrials 31 31 -1.17 54.361 France France 151.31 - - IQT31902213 100 - 6052.5 0.25 - Semi-Annual 2018-04-27 545.167 - - - - - - - 10.1 10.1 BIT:SGO; BMV:SGO N; DB:GOB; ENXTPA:SGONV; ENXT... BIT:SGO; BMV:SGO N; DB:GOB; ENXTPA:SGONV; ENXT... France 32.9 32.9 4947.2 4947.2 36827.5 36827.5 Euro Building Products 3.63 3.63 5608.6 5608.6 7.88 7.88 4.23 4.23 7150.6 7150.6 IQ875777 Europe (Primary) Europe (Primary)

Helper functions for transforming the dataset

The above dataset is clearly messy. Below functions help with cleaning up the dataset and mapping column names to something more readable.

In [4]:
def engineer_features(df, drop_perpetual_maturities=True, ytw_cutoff=100, maturity_cutoff=150, excl_no_data_columns=None,
                      fill_mean_columns=None, winsorize_columns=(None, None), country_list=None, fillna=0, show_exclusion_metrics=False):

    '''Clean up dataset: exclude/include bonds with no leverage metrics,
    drop perpetual maturities, screen out by specified yield and maturity cutoffs, 
    force convert numeric columns, clean up dates, combine local/foreign ratings
    and apply specific fixes.
    
    Parameters
    ----------
    df : Pandas DataFrame
        Dataframe to apply transformations to.
        
    excl_no_data_columns : list, default: None
        Choose which securities to exclude if there is no data.
    
    drop_perpetual_maturities : bool, default: True
        Choose whether to drop securities with perpetual maturities.
        
    ytw_cutoff: int, default: 100
        Filter out securities with yields higher than the specified threshold.
    
    ytw_cutoff: int, default: 150
        Filter out securities with maturities (in years) higher than the specified threshold.
    
    show_exclusion_metrics: bool, default: False
        Print the exclusion metrics.
        
    fill_mean_columns: list, default: None
        Fill sector/country averages for NaN values in selected columns.

    winsorize_columns: tuple(list, float), default: (None, None)
        A tuple of (list, float), specifying a list of columns to winsorize extreme
        values on and at which percentile.

    country_list: list, default: None
        List of countries to include by 'Country of Incorporation'.

    fillna: str or int, default: 0
        Fill all NaN datapoints in the dataframe with the specified value.
    
    Returns
    ----------
    Clean pandas dataframe with selected transformations applied.
    '''
    
    
    exclusion_metrics = {}
    
    df = df.copy()
    
    
    if show_exclusion_metrics:
        if country_list is not None:
            exclusion_metrics['Country Exclusions'] = (~df['Country of Incorporation'].isin(country_list)).sum()
        exclusion_metrics['YTW Cutoff'] = (df['YTW'] >= ytw_cutoff).sum()
        exclusion_metrics['Perpetuals'] = (df['Maturity Date'] == 'Perpetual').sum()
        if excl_no_data_columns is not None:
            for col in excl_no_data_columns:
                exclusion_metrics['No data: {}'.format(col)] = df[col].isnull().sum()
    
    df = df.loc[df['YTW'] < ytw_cutoff, :]
    df = df[df['Country of Incorporation'].isin(country_list)] if country_list is not None else df
    
    if excl_no_data_columns is not None:
        for col in col in excl_no_data_columns:
            df = df[df[col].notna()]

    df = df.pipe(transform_perpetuals, drop=drop_perpetual_maturities)
    df = df.pipe(date_cleanup)
    
    exclusion_metrics['Maturity Cutoff'] = (df['Years to Maturity'] >= maturity_cutoff).sum()
    
    df = df.loc[df['Years to Maturity'] < maturity_cutoff, :]
    df = df.pipe(ratings_cleanup)
    df = df.pipe(combine_ratings)
    df = df.pipe(specific_fixes)
    
    "Add calcluated features, such as EV/EBITDA, winsorize extreme values, and calculate callable bond delta/premium."
    df = df.pipe(add_features)
    df = df.pipe(winsorize, winsorize_columns[0], winsorize_columns[1]) if winsorize_columns[0] is not None else df
    df['Call Delta'] = df[['Price', 'Next Call Price', 'Years to Next Call']].apply(get_bond_delta, axis=1)
    df['Call Premium'] = ((df['Next Call Price'] / df['Price'])**(1/df['Years to Next Call'].clip(lower=1)) - 1) * 100
    mask = df_clean['Sector'] == '-'
    df.loc[mask, 'Sector'] = df.loc[mask, 'Sector (Parent)']
#     df = df.pipe(add_zscore)

    df = df.pipe(fillna_mean, columns=fill_mean_columns, group=['Sector', 'Country of Incorporation']) if fill_mean_columns is not None else df
    df = df.fillna(fillna)

    df = df.sort_values('Issuer')
    [print('{}: {}'.format(k, v)) for k, v in exclusion_metrics.items() if show_exclusion_metrics]
    return df

def clean_df(df):
    'Rename columns to reasonably mapped names, force numeric fields.'
    df = df.copy()
    df = df.rename(column_mapping, axis=1)
    df.loc[:, numeric_fields_mapped] = df.loc[:, numeric_fields_mapped].apply(pd.to_numeric, errors='coerce')
    return df

def get_bond_delta(row):
    "Get the 'delta' of a callable bond using a simple Black model. To be used with df.apply() pandas function"
    
    s, k, t = row.values
    if np.isnan(k) or t <= 0:
        return np.nan
    try:
        delta = blackscholes.BSMerton([1, s, k, .02, .00, t*365, .2]).delta()
    except:
        return np.nan
    return delta[0]

def add_features(df):
    'Add various calculated fields.'
    
    df.loc[:, 'EV/EBITDA Fwd'] = df['EV'] / df['NTM EBITDA']
    df.loc[:, 'EV/EBITDA Fwd (Parent)'] = df['EV (Parent)'] / df['NTM EBITDA (Parent)']
    df.loc[:, 'Net Debt/EBITDA'] = df['Net Debt (Parent)'] / df['EBITDA (Parent)']
    df.loc[:, 'Net Debt/EBITDA Fwd (Parent)'] = df['Net Debt (Parent)'] / df['NTM EBITDA (Parent)']
    df.loc[:, 'Current Yield'] = (df['Coupon'] / df['Price']) * 100
#     df.loc[:, 'Net Debt/EBITDA Fwd (Parent)'] = df.loc[:, 'Net Debt/EBITDA Fwd (Parent)'].fillna(df['Net Debt/EBITDA'])
    return df

def transform_perpetuals(df, drop=True, set_maturity='2118-12-31'):
    'Drop or set to specific maturity.'
    df = df.copy()
    if drop:
        df = df.loc[df['Maturity Date'] != 'Perpetual', :]
    if set_maturity is not None:
        df.loc[df['Maturity Date'] == 'Perpetual', 'Maturity Date'] = parse('2118-12-31')
    return df

def date_cleanup(df):
    '''Convert perpetual maturities to specific long-term maturities to allow for computations.
    Add Years to Maturity and Years to Next Call fields.'''
    
    df['Years to Maturity'] = (pd.to_datetime(df['Maturity Date'])-datetime.datetime.today())/datetime.timedelta(days=1)/365
    df['Years to Next Call'] = pd.to_datetime(df['Next Call Date'], errors='coerce').apply(
        lambda x: np.round((x-datetime.datetime.today())/datetime.timedelta(days=1)/365, decimals=2) if pd.notna(x) else np.nan)
    return df

def ratings_cleanup(df):
    'Remove S&P specific designators from ratings.'
    
    df.loc[:, 'Rating (Local Currency)'] = df.loc[:, 'Rating (Local Currency)'].apply(lambda x: x.replace(' (sf)', ''))
    df.loc[:, 'Rating (Foreign Currency)'] = df.loc[:, 'Rating (Foreign Currency)'].apply(lambda x: x.replace(' (sf)', ''))
    return df

def combine_ratings(df):
    'Combine local and foreign currency ratings from the S&P.'

    df['Rating'] = df['Rating (Local Currency)'].replace('-', np.nan).fillna(df['Rating (Foreign Currency)'])
    df['Rating'] = df['Rating'].replace('AA-/A-1+', '')
    return df

def specific_fixes(df):
    # fix Roche bond call price from 1000 to 100
    df.loc[df['Security ID'] == 'IQT315939319', 'Next Call Price'] = 100
    # fix Intel bond call price from 1000 to 100
    df.loc[df['Security ID'] == 'IQT338089502', 'Next Call Price'] = 100
    return df

def encode_ratings(df):
    'Encoding ratings in a single field based on a sorted list of ratings, rather than one-hot encoder.'
    
    encoded_ratings_dict = dict([(k, v) for k, v in zip(ratings_sorted, range(1,len(ratings_sorted)+1))])
    df['Rating'] = df['Rating'].apply(lambda x: encoded_ratings_dict[x])
    return df

def winsorize(df, columns, percentile=0.005):
    '''Winsorization function based on percentile.

    Parameters
    ----------
    columns : list
        list of columns to apply winsorization to
        
    percentile : float, default: 0.005
        Percentile to winsorize to

    Returns
    ----------
    Pandas dataframe with winsorization applied to selected columns.
    '''

    numeric_fields_mapped = [column_mapping[field] for field in numeric_fields]
    for field in columns:
        df.loc[df[field].notna(), field] = scipy.stats.mstats.winsorize(df.loc[df[field].notna(), field], limits=(0, percentile))
    return df
    
def fillna_mean(df, columns, group):
    '''Fill NaN with means from a group.

    Parameters
    ----------
    columns : list
        list of columns to apply group means to NaN values
        
    group : string
        Column name to group by

    Returns
    ----------
    Pandas dataframe with group means applied to selected columns.
    '''

    df.loc[:, columns] = df.groupby(group)[columns].transform(lambda x: x.fillna(x.mean()))
    return df

def add_zscore(df):
    'Transform values to z-scores for all numerical columns.'
    numeric_fields_mapped = [column_mapping[field] for field in numeric_fields]
    numeric_fields_zscore = [field +'_zscore' for field in numeric_fields_mapped]
    df_zscores = df.loc[:, numeric_fields_mapped].apply(lambda x: (x - x.mean())/x.std(ddof=0))
    df_zscores.columns = numeric_fields_zscore
    df = df.join(df_zscores)
    return df

def missing_values_table(df):
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
        mis_val_table_ren_columns = mis_val_table.rename(
        columns = {0 : 'Missing Values', 1 : '% of Total Values'})
        mis_val_table_ren_columns = mis_val_table_ren_columns[
            mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns.shape[0]) +
              " columns that have missing values.")
        return mis_val_table_ren_columns

Quick YTW/Maturity Visualization

In [5]:
fill_mean_columns = ['Net Debt/EBITDA', 'Net Debt/EBITDA Fwd (Parent)', 'EV/EBITDA Fwd (Parent)', '2y FWD EBITDA Growth (Parent)']
country_list = ['France', 'Italy', 'United States', 'Switzerland', 'United Kingdom', 'Sweden', 'Finland', 'Germany', 'Spain', 'Belgium', 'Canada', 'Australia']

df_clean = clean_df(df)
df_analysis = engineer_features(df_clean, excl_no_data_columns=None, drop_perpetual_maturities=True, ytw_cutoff=25,
                                maturity_cutoff=100, fill_mean_columns=fill_mean_columns, country_list=country_list,
                                 fillna=0, show_exclusion_metrics=True)

fig, ax = plt.subplots(1, figsize=(15,8))
ax.scatter(df_analysis['Years to Maturity'],
           df_analysis['YTW']);
ax.set_xlabel('Years to Maturity')
ax.set_ylabel('Yield to Worst');
print('Number of bonds selected/total: {}/{} (excl: {})'.format(df_analysis.shape[0], df.shape[0], df.shape[0]-df_analysis.shape[0]))
YTW Cutoff: 18
Perpetuals: 505
Country Exclusions: 8585
Maturity Cutoff: 1
Number of bonds selected/total: 21710/30640 (excl: 8930)

Selecting reasonable features

In [6]:
features = [
     'Seniority',
     'Coupon',
     'Coupon Type',
     'Amount Outstanding',
     'Rating',
     'Duration',
     'Currency',
     'Sector',
     'Primary Industry',
     'Country of Incorporation',
     'Years to Maturity',
     'Security Type',

     'Next Call Price',
#      'Call Premium',
     'Call Delta',
     'Conversion Premium',
    
     'Net Debt/EBITDA',
     'Net Debt/EBITDA Fwd (Parent)',
     'EV/EBITDA Fwd',
     'EV/EBITDA Fwd (Parent)',
     'LTM EBITDA Growth (Parent)',
     '2y FWD EBITDA Growth (Parent)',
     '2y Fwd Revenue Growth (Parent)',
     'EBITDA Margin (Parent)',
     'Total Debt/Capital',
     'Total Debt/Capital (Parent)',
    
#      'Price Change (3m %)',
#      'Price',
#      'Current Yield',
#      'Years to Next Call',
#      'No of Analysts (Parent)',
#      'Price Change (3m %)',
]
le = preprocessing.LabelEncoder()
lb = preprocessing.LabelBinarizer()
mlb = preprocessing.MultiLabelBinarizer()

Clean the dataset

The callable bond delta calulation takes a ~5 seconds - see if this can be optimized.

In [7]:
fill_mean_columns = ['Net Debt/EBITDA', 'Net Debt/EBITDA Fwd (Parent)', 'EV/EBITDA Fwd (Parent)', '2y FWD EBITDA Growth (Parent)']
country_list = ['France', 'Italy', 'United States', 'Switzerland', 'United Kingdom', 'Sweden', 'Finland', 'Germany', 'Spain', 'Belgium', 'Canada', 'Australia']

df_clean = clean_df(df)
df_analysis = engineer_features(df_clean, excl_no_data_columns=None, drop_perpetual_maturities=True, ytw_cutoff=25,
                                maturity_cutoff=100, fill_mean_columns=fill_mean_columns, winsorize_columns=(['Net Debt/EBITDA'], 0.002),
                                country_list=country_list, fillna=0, show_exclusion_metrics=True)

df_analysis.head(3)
YTW Cutoff: 18
Perpetuals: 505
Country Exclusions: 8585
Maturity Cutoff: 1
Out[7]:
Maturity Date Issuer Exchange:Ticker Security Type Seniority Coupon Coupon Type Offering Date Offering Amount Amount Outstanding Rating (Local Currency) Rating Date (Local Currency) Parent Name YTW Duration Net Debt/EBITDA Net Debt/(EBITDA-CAPEX) Next Call Date Next Call Price Industry Classifications Sector Sector (Parent) No of Analysts No of Analysts (Parent) Price Change (3m %) Price Country (Issuer) Country (Parent) Amount Outstanding ($USDmm, Historical rate) Rating (Foreign Currency) Rating Date (Foreign Currency) Security ID Offering Price Offering Yield Principal Amount Spread Gross Spread Payment Frequency Evaluation Date Convexity Conversion Security Conversion Price Conversion Security Price Conversion Premium Bondholder Protective Issuer Restrictive Subsidiary Restrictive EBITDA Margin EBITDA Margin (Parent) Security Tickers (Issuer) Security Tickers (Parent) Country of Incorporation Total Debt/Capital Total Debt/Capital (Parent) EBITDA EBITDA (Parent) EV EV (Parent) Currency Primary Industry 2y Fwd Revenue Growth 2y Fwd Revenue Growth (Parent) NTM EBITDA NTM EBITDA (Parent) 2y FWD EBITDA Growth 2y FWD EBITDA Growth (Parent) LTM EBITDA Growth LTM EBITDA Growth (Parent) Net Debt Net Debt (Parent) Excel Company ID Geographic Locations [Issuer] Geographic Locations [Ultimate Parent] Years to Maturity Years to Next Call Rating EV/EBITDA Fwd EV/EBITDA Fwd (Parent) Net Debt/EBITDA Fwd (Parent) Current Yield Call Delta Call Premium
23855 2022-01-15 1011778 B.C. Unlimited Liability Company - Corporate Debentures Senior Unsecured 4.625 Fixed 2015-05-14 00:00:00 1250.0 1250.0 - - Restaurant Brands International Inc. (NYSE:QSR) 4.512 3.329 5.282116 0.0 2018-10-01 00:00:00 101.156 Consumer Discretionary (Primary); Consumer Ser... Consumer Discretionary Consumer Discretionary 0.0 21.0 -1.71 100.375 Canada Canada 1250.0 B+ 2017-02-24 00:00:00 IQT301463928 100.0 4.625 1000.0 0.0 0.0 Semi-Annual 2018-04-27 13.352 - 0.0 0.0 0.0 - - - 0.0 44.1 - BMV:QSR N; DB:0R6; LSE:0VFA; NYSE:QSR; TSX:QSR Canada 0.0 74.1 0.0 2132.1 0.0 26923.5 US Dollar Restaurants 0.0 12.2 0.0 2306.0 0.0 6.85 0.0 14.7 0.0 11262.0 IQ821389 Canada (Primary) Canada (Primary) 3.656347 0.36 B+ 0.0 11.675412 4.883781 4.607721 0.522094 0.778082
17240 2024-05-15 1011778 B.C. Unlimited Liability Company - Corporate Debentures Senior Secured 4.250 Fixed 2017-05-03 00:00:00 1500.0 1500.0 - - Restaurant Brands International Inc. (NYSE:QSR) 5.126 5.170 5.282116 0.0 2020-05-15 00:00:00 102.125 Consumer Discretionary (Primary); Consumer Ser... Consumer Discretionary Consumer Discretionary 0.0 21.0 -3.05 95.500 Canada Canada 1500.0 B+ 2017-05-03 00:00:00 IQT429578546 100.0 4.250 1000.0 213.0 0.0 Semi-Annual 2018-04-27 31.637 - 0.0 0.0 0.0 - - - 0.0 44.1 - BMV:QSR N; DB:0R6; LSE:0VFA; NYSE:QSR; TSX:QSR Canada 0.0 74.1 0.0 2132.1 0.0 26923.5 US Dollar Restaurants 0.0 12.2 0.0 2306.0 0.0 6.85 0.0 14.7 0.0 11262.0 IQ821389 Canada (Primary) Canada (Primary) 5.987854 1.99 B+ 0.0 11.675412 4.883781 4.450262 0.517710 3.427859
14604 2025-10-15 1011778 B.C. Unlimited Liability Company - Corporate Debentures Senior Unsecured 5.000 Fixed 2017-08-08 00:00:00 1300.0 1500.0 - - Restaurant Brands International Inc. (NYSE:QSR) 5.622 6.102 5.282116 0.0 2020-10-15 00:00:00 102.500 Consumer Discretionary (Primary); Consumer Ser... Consumer Discretionary Consumer Discretionary 0.0 21.0 -4.47 96.250 Canada Canada 1500.0 B- 2017-08-08 00:00:00 IQT533693018 100.0 5.000 1000.0 280.0 0.0 Semi-Annual 2018-04-27 44.464 - 0.0 0.0 0.0 Change in Control Put Provisions; Defeasance w... Liens; Sale of Assets; Indebtedness; Transacti... Liens; Indebtedness 0.0 44.1 - BMV:QSR N; DB:0R6; LSE:0VFA; NYSE:QSR; TSX:QSR Canada 0.0 74.1 0.0 2132.1 0.0 26923.5 US Dollar Restaurants 0.0 12.2 0.0 2306.0 0.0 6.85 0.0 14.7 0.0 11262.0 IQ821389 Canada (Primary) Canada (Primary) 7.407032 2.40 B- 0.0 11.675412 4.883781 5.194805 0.542520 2.656071

Select features and generate the train dataset

Parsing the list of covenants takes the most time (~10 sec).

In [8]:
train = df_analysis[features].copy()

# apply_zscore = ['Amount Outstanding', 'Net Debt/EBITDA']
# train[apply_zscore] = train.groupby(['Country', 'Sector']).transform(lambda x: (x - x.mean())/x.std())[apply_zscore]

# get one-hot encodings for all categoricals
train = pd.get_dummies(train)
train = train.drop('Rating_-', axis=1, errors='ignore')

# get one-hot encodings for all covenants
train['BP'] = df_analysis['Bondholder Protective'].apply(lambda x: x.split('; ')).apply(frozenset).to_frame(name='BP')
train['IR'] = df_analysis['Issuer Restrictive'].apply(lambda x: x.split('; ')).apply(frozenset).to_frame(name='IR')

for bp in frozenset.union(*train.BP):
    train[bp] = train.apply(lambda _: int(bp in _.BP), axis=1)
train = train.drop(['-', 'BP'], axis=1)
for ir in frozenset.union(*train.IR):
    train[ir] = train.apply(lambda _: int(ir in _.IR), axis=1)
train = train.drop(['-', 'IR'], axis=1)
feature_names = train.columns
df_train_copy = train.copy()

# preprocess the dataset (standard scaler, etc)
pipeline = Pipeline([
    ('scale', preprocessing.StandardScaler()),
#     ('pca', decomposition.PCA()),
])

df_train_copy.head()
Out[8]:
Coupon Amount Outstanding Duration Years to Maturity Next Call Price Call Delta Conversion Premium Net Debt/EBITDA Net Debt/EBITDA Fwd (Parent) EV/EBITDA Fwd EV/EBITDA Fwd (Parent) LTM EBITDA Growth (Parent) 2y FWD EBITDA Growth (Parent) 2y Fwd Revenue Growth (Parent) EBITDA Margin (Parent) Total Debt/Capital Total Debt/Capital (Parent) Seniority_Junior Subordinate Seniority_Not Ranked Seniority_Preferred Seniority_Senior Secured Seniority_Senior Subordinate Seniority_Senior Unsecured Seniority_Subordinate Coupon Type_- Coupon Type_Fixed Coupon Type_Step-Up Coupon Type_Variable Coupon Type_Zero Rating_ Rating_A Rating_A+ Rating_A- Rating_AA Rating_AA+ Rating_AA- Rating_AAA Rating_B Rating_B+ Rating_B- Rating_BB Rating_BB+ Rating_BB- Rating_BBB Rating_BBB+ Rating_BBB- Rating_CC Rating_CCC Rating_CCC+ Rating_CCC- Rating_D Rating_NR Currency_- Currency_Australian Dollar Currency_Brazilian Real Currency_British Pound Currency_Canadian Dollar Currency_Chinese Renminbi (Yuan) Currency_Czech Koruna Currency_Danish Krone Currency_Euro Currency_Hong Kong Dollar Currency_Hungarian Forint Currency_Israeli Shekel Currency_Japanese Yen Currency_Malaysian Ringgit Currency_Mexican Peso Currency_New Zealand Dollar Currency_Norwegian Krone Currency_Polish Zloty Currency_Russian Rouble Currency_Singapore Dollar Currency_South African Rand Currency_Swedish Krona Currency_Swiss Franc Currency_Turkish New Lira Currency_US Dollar Sector_- Sector_Consumer Discretionary Sector_Consumer Staples Sector_Energy Sector_Financials Sector_Healthcare Sector_Industrials Sector_Information Technology Sector_Materials Sector_Real Estate Sector_Telecommunication Services Sector_Utilities Primary Industry_- Primary Industry_Advertising Primary Industry_Aerospace and Defense Primary Industry_Agricultural Products Primary Industry_Agricultural and Farm Machinery Primary Industry_Air Freight and Logistics Primary Industry_Airlines Primary Industry_Airport Services Primary Industry_Alternative Carriers Primary Industry_Aluminum Primary Industry_Apparel Retail Primary Industry_Apparel, Accessories and Luxury Goods Primary Industry_Application Software Primary Industry_Asset Management and Custody Banks Primary Industry_Auto Parts and Equipment Primary Industry_Automobile Manufacturers Primary Industry_Automotive Retail Primary Industry_Biotechnology Primary Industry_Brewers Primary Industry_Broadcasting Primary Industry_Building Products Primary Industry_Cable and Satellite Primary Industry_Casinos and Gaming Primary Industry_Coal and Consumable Fuels Primary Industry_Commercial Printing Primary Industry_Commodity Chemicals Primary Industry_Communications Equipment Primary Industry_Computer and Electronics Retail Primary Industry_Construction Machinery and Heavy Trucks Primary Industry_Construction Materials Primary Industry_Construction and Engineering Primary Industry_Consumer Electronics Primary Industry_Consumer Finance Primary Industry_Copper Primary Industry_Data Processing and Outsourced Services Primary Industry_Department Stores Primary Industry_Distillers and Vintners Primary Industry_Distributors Primary Industry_Diversified Banks Primary Industry_Diversified Capital Markets Primary Industry_Diversified Chemicals Primary Industry_Diversified Metals and Mining Primary Industry_Diversified REITs Primary Industry_Diversified Real Estate Activities Primary Industry_Diversified Support Services Primary Industry_Drug Retail Primary Industry_Education Services Primary Industry_Electric Utilities Primary Industry_Electrical Components and Equipment Primary Industry_Electronic Components Primary Industry_Electronic Equipment and Instruments Primary Industry_Electronic Manufacturing Services Primary Industry_Environmental and Facilities Services Primary Industry_Fertilizers and Agricultural Chemicals Primary Industry_Financial Exchanges and Data Primary Industry_Food Distributors Primary Industry_Food Retail Primary Industry_Footwear Primary Industry_Forest Products Primary Industry_Gas Utilities Primary Industry_General Merchandise Stores Primary Industry_Gold Primary Industry_Health Care Technology Primary Industry_Healthcare Distributors Primary Industry_Healthcare Equipment Primary Industry_Healthcare Facilities Primary Industry_Healthcare REITs Primary Industry_Healthcare Services Primary Industry_Healthcare Supplies Primary Industry_Heavy Electrical Equipment Primary Industry_Highways and Railtracks Primary Industry_Home Entertainment Software Primary Industry_Home Furnishing Retail Primary Industry_Home Furnishings Primary Industry_Home Improvement Retail Primary Industry_Homebuilding Primary Industry_Hotel and Resort REITs Primary Industry_Hotels, Resorts and Cruise Lines Primary Industry_Household Appliances Primary Industry_Household Products Primary Industry_Housewares and Specialties Primary Industry_Human Resource and Employment Services Primary Industry_Hypermarkets and Super Centers Primary Industry_IT Consulting and Other Services Primary Industry_Independent Power Producers and Energy Traders Primary Industry_Industrial Conglomerates Primary Industry_Industrial Gases Primary Industry_Industrial Machinery Primary Industry_Industrial REITs Primary Industry_Insurance Brokers Primary Industry_Integrated Oil and Gas Primary Industry_Integrated Telecommunication Services Primary Industry_Internet Software and Services Primary Industry_Internet and Direct Marketing Retail Primary Industry_Investment Banking and Brokerage Primary Industry_Leisure Facilities Primary Industry_Leisure Products Primary Industry_Life Sciences Tools and Services Primary Industry_Life and Health Insurance Primary Industry_Managed Healthcare Primary Industry_Marine Primary Industry_Marine Ports and Services Primary Industry_Metal and Glass Containers Primary Industry_Mortgage REITs Primary Industry_Motorcycle Manufacturers Primary Industry_Movies and Entertainment Primary Industry_Multi-Sector Holdings Primary Industry_Multi-Utilities Primary Industry_Multi-line Insurance Primary Industry_Office REITs Primary Industry_Office Services and Supplies Primary Industry_Oil and Gas Drilling Primary Industry_Oil and Gas Equipment and Services Primary Industry_Oil and Gas Exploration and Production Primary Industry_Oil and Gas Refining and Marketing Primary Industry_Oil and Gas Storage and Transportation Primary Industry_Other Diversified Financial Services Primary Industry_Packaged Foods and Meats Primary Industry_Paper Packaging Primary Industry_Paper Products Primary Industry_Personal Products Primary Industry_Pharmaceuticals Primary Industry_Precious Metals and Minerals Primary Industry_Property and Casualty Insurance Primary Industry_Publishing Primary Industry_Railroads Primary Industry_Real Estate Development Primary Industry_Real Estate Operating Companies Primary Industry_Real Estate Services Primary Industry_Regional Banks Primary Industry_Reinsurance Primary Industry_Renewable Electricity Primary Industry_Research and Consulting Services Primary Industry_Residential REITs Primary Industry_Restaurants Primary Industry_Retail REITs Primary Industry_Security and Alarm Services Primary Industry_Semiconductor Equipment Primary Industry_Semiconductors Primary Industry_Silver Primary Industry_Soft Drinks Primary Industry_Specialized Consumer Services Primary Industry_Specialized Finance Primary Industry_Specialized REITs Primary Industry_Specialty Chemicals Primary Industry_Specialty Stores Primary Industry_Steel Primary Industry_Systems Software Primary Industry_Technology Distributors Primary Industry_Technology Hardware, Storage and Peripherals Primary Industry_Thrifts and Mortgage Finance Primary Industry_Tires and Rubber Primary Industry_Tobacco Primary Industry_Trading Companies and Distributors Primary Industry_Trucking Primary Industry_Water Utilities Primary Industry_Wireless Telecommunication Services Country of Incorporation_Australia Country of Incorporation_Belgium Country of Incorporation_Canada Country of Incorporation_Finland Country of Incorporation_France Country of Incorporation_Germany Country of Incorporation_Italy Country of Incorporation_Spain Country of Incorporation_Sweden Country of Incorporation_Switzerland Country of Incorporation_United Kingdom Country of Incorporation_United States Security Type_Corporate Bank Note Security Type_Corporate Convertible Security Type_Corporate Debentures Security Type_Foreign Currency Debenture Covenant Defeasance w/o Tax Conseqences Voting Power Percentage ERP Defeasance w/o Tax Consequences Asset Sale Clause Rating Decline Provision After Acquired Property Clause Cross Default Voting Power Percentage Rating Decline Trigger Put Economic Coverage Default Declining Net Worth Provisions Declining Net Worth Change in Control Put Provisions Legal Defeasance Negative Pledge Covenant Cross Acceleration Restricted Payments Liens Net Earnings Test Issuance Sale of Assets Indebtedness Dividend Related Payments Sub Debt Issuance Stock Issuance Transactions with Affiliates Investments Maintenance Net Worth Stock Transfer, Sale or Disposal Funded Debt Senior Debt Issuance Consolidation or Merger Fixed Charge Coverage Sale Lease Back
23855 4.625 1250.0 3.329 3.656347 101.156 0.522094 0.0 5.282116 4.883781 0.0 11.675412 14.7 6.85 12.2 44.1 0.0 74.1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17240 4.250 1500.0 5.170 5.987854 102.125 0.517710 0.0 5.282116 4.883781 0.0 11.675412 14.7 6.85 12.2 44.1 0.0 74.1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14604 5.000 1500.0 6.102 7.407032 102.500 0.542520 0.0 5.282116 4.883781 0.0 11.675412 14.7 6.85 12.2 44.1 0.0 74.1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0
17238 4.250 1500.0 5.170 5.987854 102.125 0.517710 0.0 5.282116 4.883781 0.0 11.675412 14.7 6.85 12.2 44.1 0.0 74.1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17239 4.250 1500.0 5.170 5.987854 102.125 0.517710 0.0 5.282116 4.883781 0.0 11.675412 14.7 6.85 12.2 44.1 0.0 74.1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Run a Random Forest regressor on all features to analyze feature importances

In [9]:
if __name__ == '__main__':
    # Feature importances
    clf_rf = ensemble.RandomForestRegressor(n_estimators=100, min_samples_leaf=5, max_depth=30, oob_score=True, n_jobs=-1)
    clf_rf.fit(pipeline.fit_transform(df_train_copy), df_analysis['YTW'])
    df_feature_importances = pd.DataFrame(clf_rf.feature_importances_, index=df_train_copy.columns, columns=['importance']).sort_values('importance', ascending=False)

top_features = 50
fig, ax = plt.subplots(1, figsize=(30,8))
ax.bar(df_feature_importances.head(top_features).index, df_feature_importances.head(top_features).importance, tick_label=df_feature_importances.head(top_features).index)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment='right');
ax.set_ylim(0, .4);
ax.set_ylabel('Feature Importance');

Select top X features to deal with the curse of dimensionality (more data would be nice) and generate the final train dataset

In [10]:
# use feature importance for feature selection
select_top_features = 250
feature_names = df_feature_importances.head(select_top_features).index
df_train = df_train_copy[feature_names].copy()
train = pipeline.fit_transform(df_train)

View mean values of numerical features by sector

In [11]:
df_stats = df_analysis[features + ['Price Change (3m %)']].groupby('Sector').describe()
df_stats.loc[:, (slice(None), ['mean'])].applymap(lambda x: '{:.2f}'.format(x))
Out[11]:
2y FWD EBITDA Growth (Parent) 2y Fwd Revenue Growth (Parent) Amount Outstanding Call Delta Conversion Premium Coupon Duration EBITDA Margin (Parent) EV/EBITDA Fwd EV/EBITDA Fwd (Parent) LTM EBITDA Growth (Parent) Net Debt/EBITDA Net Debt/EBITDA Fwd (Parent) Next Call Price Price Change (3m %) Total Debt/Capital Total Debt/Capital (Parent) Years to Maturity
mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean
Sector
- 0.00 0.00 391.58 0.23 0.00 3.92 4.40 0.00 0.00 0.00 0.00 0.00 0.00 43.56 -0.30 0.00 0.00 19.47
Consumer Discretionary 11.29 4.98 569.64 0.35 0.18 4.99 5.81 16.93 5.85 10.75 28.63 3.66 2.80 58.51 -2.42 128.69 53.46 8.86
Consumer Staples 6.23 3.63 674.94 0.29 0.08 4.12 6.48 17.36 7.88 11.87 8.19 2.78 2.47 44.72 -2.36 46.64 51.57 9.65
Energy 25.85 9.59 546.78 0.38 0.10 5.66 6.28 30.35 5.16 9.98 87.68 3.08 3.35 64.66 -2.92 37.43 44.70 10.04
Financials 7.55 2.02 549.99 0.12 0.06 3.00 5.35 4.51 0.39 6.35 3.19 3.37 2.87 19.98 -0.83 16.40 17.19 7.63
Healthcare 5.37 4.23 739.26 0.34 0.32 4.29 6.76 20.92 7.23 11.27 4.85 2.43 2.11 53.64 -2.51 43.91 41.54 10.58
Industrials 10.31 4.49 511.87 0.32 0.29 4.41 6.54 18.29 5.93 10.69 12.13 3.82 2.35 51.04 -2.03 50.10 52.38 9.98
Information Technology 9.39 7.15 860.28 0.34 0.42 3.97 5.48 25.53 9.61 11.33 12.62 1.26 1.00 54.71 -2.19 45.45 51.70 7.88
Materials 13.10 5.59 501.79 0.38 0.17 5.20 5.42 18.98 5.69 8.73 17.60 2.73 1.94 62.53 -2.37 40.37 43.00 8.05
Real Estate 5.14 4.21 349.35 0.34 0.43 3.67 5.57 48.02 11.96 19.90 10.99 8.14 7.28 51.48 -1.49 42.74 44.95 7.31
Telecommunication Services 2.90 1.34 907.65 0.21 0.15 4.86 7.08 31.85 4.48 6.93 1.85 3.13 2.78 38.39 -1.78 45.97 59.59 11.14
Utilities 4.30 2.47 401.70 0.34 0.07 4.56 8.46 30.18 2.07 9.95 5.72 4.50 4.02 46.95 -2.69 46.13 52.04 13.59

Visualize the distribution of 3-month % price change by sector

In [14]:
y_field = 'Price Change (3m %)'
splot = sns.violinplot(x='Sector', y=y_field, hue='Country', split=True,
                       data=df_analysis.assign(Country = lambda x:df_analysis['Country of Incorporation'].apply(lambda x: 'US' if x == 'United States' else 'Non-US')))
splot.set_ylim(df_analysis[y_field].quantile([.001, .999]).values)
[(x.set_rotation(45), x.set_horizontalalignment('right')) for x in splot.get_xticklabels()]
splot.set_title("3m % Price Change Distribution by Sector", fontsize=15);
matplotlib.rcParams['figure.figsize'] = (16.0, 8.0)

Datapoint distribution by sector

In [33]:
df_analysis.replace('-', np.nan).pivot_table(index='Sector', aggfunc='count').div(
    df_analysis.pivot_table(index='Sector', aggfunc='size'), axis=0).applymap(lambda x: '{:.0f}%'.format(x*100)).drop('-')
Out[33]:
2y FWD EBITDA Growth 2y FWD EBITDA Growth (Parent) 2y Fwd Revenue Growth 2y Fwd Revenue Growth (Parent) Amount Outstanding Amount Outstanding ($USDmm, Historical rate) Bondholder Protective Call Delta Call Premium Conversion Premium Conversion Price Conversion Security Conversion Security Price Convexity Country (Issuer) Country (Parent) Country of Incorporation Coupon Coupon Type Currency Current Yield Duration EBITDA EBITDA (Parent) EBITDA Margin EBITDA Margin (Parent) EV EV (Parent) EV/EBITDA Fwd EV/EBITDA Fwd (Parent) Evaluation Date Excel Company ID Exchange:Ticker Geographic Locations [Issuer] Geographic Locations [Ultimate Parent] Gross Spread Industry Classifications Issuer Issuer Restrictive LTM EBITDA Growth LTM EBITDA Growth (Parent) Maturity Date NTM EBITDA NTM EBITDA (Parent) Net Debt Net Debt (Parent) Net Debt/(EBITDA-CAPEX) Net Debt/EBITDA Net Debt/EBITDA Fwd (Parent) Next Call Date Next Call Price No of Analysts No of Analysts (Parent) Offering Amount Offering Date Offering Price Offering Yield Parent Name Payment Frequency Price Price Change (3m %) Primary Industry Principal Amount Rating Rating (Foreign Currency) Rating (Local Currency) Rating Date (Foreign Currency) Rating Date (Local Currency) Sector (Parent) Security ID Security Tickers (Issuer) Security Tickers (Parent) Security Type Seniority Spread Subsidiary Restrictive Total Debt/Capital Total Debt/Capital (Parent) YTW Years to Maturity Years to Next Call
Sector
Consumer Discretionary 100% 100% 100% 100% 100% 100% 49% 100% 100% 100% 100% 1% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 54% 99% 98% 100% 100% 100% 45% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 58% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 92% 12% 80% 12% 80% 98% 100% 54% 77% 100% 100% 100% 41% 100% 100% 100% 100% 100%
Consumer Staples 100% 100% 100% 100% 100% 100% 60% 100% 100% 100% 100% 0% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 70% 100% 97% 100% 100% 100% 53% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 45% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 93% 19% 74% 19% 74% 99% 100% 70% 85% 100% 100% 100% 49% 100% 100% 100% 100% 100%
Energy 100% 100% 100% 100% 100% 100% 59% 100% 100% 100% 100% 0% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 56% 100% 97% 100% 100% 100% 57% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 64% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 94% 13% 82% 13% 82% 100% 100% 57% 87% 100% 100% 100% 48% 100% 100% 100% 100% 100%
Financials 100% 100% 100% 100% 100% 100% 13% 100% 100% 100% 100% 0% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 34% 99% 98% 100% 100% 100% 10% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 20% 100% 100% 100% 100% 100% 100% 100% 100% 97% 100% 100% 100% 100% 61% 23% 38% 23% 38% 96% 100% 38% 61% 100% 100% 100% 5% 100% 100% 100% 100% 100%
Healthcare 100% 100% 100% 100% 100% 100% 60% 100% 100% 100% 100% 1% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 63% 100% 100% 100% 98% 100% 52% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 53% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 98% 100% 91% 12% 79% 12% 79% 99% 100% 63% 79% 100% 100% 100% 44% 100% 100% 100% 100% 100%
Industrials 100% 100% 100% 100% 100% 100% 45% 100% 100% 100% 100% 1% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 57% 100% 99% 100% 99% 100% 39% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 51% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 99% 100% 83% 11% 73% 11% 73% 98% 100% 57% 73% 100% 100% 100% 32% 100% 100% 100% 100% 100%
Information Technology 100% 100% 100% 100% 100% 100% 67% 100% 100% 100% 100% 1% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 83% 100% 97% 100% 100% 100% 56% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 55% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 92% 10% 82% 10% 82% 99% 100% 83% 96% 100% 100% 100% 49% 100% 100% 100% 100% 100%
Materials 100% 100% 100% 100% 100% 100% 50% 100% 100% 100% 100% 1% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 65% 100% 94% 100% 100% 100% 46% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 61% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 91% 27% 64% 27% 64% 98% 100% 66% 83% 100% 100% 100% 44% 100% 100% 100% 100% 100%
Real Estate 100% 100% 100% 100% 100% 100% 40% 100% 100% 100% 100% 2% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 59% 100% 100% 100% 99% 100% 36% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 52% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 99% 100% 66% 4% 62% 4% 62% 99% 100% 60% 80% 100% 100% 100% 32% 100% 100% 100% 100% 100%
Telecommunication Services 100% 100% 100% 100% 100% 100% 42% 100% 100% 100% 100% 0% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 65% 100% 98% 100% 100% 100% 34% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 38% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 87% 21% 66% 21% 66% 100% 100% 66% 98% 100% 100% 100% 16% 100% 100% 100% 100% 100%
Utilities 100% 100% 100% 100% 100% 100% 50% 100% 100% 100% 100% 0% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 24% 100% 100% 100% 100% 100% 46% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 47% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 88% 5% 84% 5% 84% 99% 100% 31% 87% 100% 100% 100% 26% 100% 100% 100% 100% 100%

Regressor dictionary

Adding any combination of sk-learn regressors to run the dataset through. Regressors can be chosen using the runthrough of all linear regressors in scikit-learn at the bottom of the notebook.

In [16]:
sk_regressors = {
    'BRidge':     linear_model.BayesianRidge(n_iter=100000, tol=1e-3, alpha_1=10**-6, alpha_2=10**-0.25, lambda_1=10**-6, lambda_2=10**-6),
    'ElasticNet': linear_model.ElasticNet(1e-06, l1_ratio=0.1, max_iter=1000000, normalize=True, random_state=1, tol=1e-05, warm_start=False),
#     'ElasticNet0': linear_model.ElasticNet(alpha=1e-05, copy_X=True, fit_intercept=True, l1_ratio=0.3, max_iter=100000, normalize=False,
#                                           positive=False, precompute=False, random_state=1, selection='cyclic', tol=1e-05, warm_start=False),
#     'ElasticNet1': linear_model.ElasticNet(1e-05, l1_ratio=0.1, max_iter=1000000, normalize=True, random_state=1, tol=1e-05, warm_start=False),
#     'ElasticNet2': linear_model.ElasticNet(1e-06, l1_ratio=1.0, max_iter=1000000, normalize=True, random_state=1, tol=1e-05, warm_start=False),
#     'ElasticNet3': linear_model.ElasticNet(1e-05, l1_ratio=0.5, max_iter=1000000, normalize=True, random_state=1, tol=1e-05, warm_start=False),
#     'LassoLarsIC':linear_model.LassoLarsIC(criterion='bic', eps=1e-5),
#     'MNaiveBayes': naive_bayes.MultinomialNB(),
#     'NN': neural_network.MLPRegressor(hidden_layer_sizes=(50,50,50), max_`iter=1000, ),
#     'KRidge':     kernel_ridge.KernelRidge(alpha=10**1.5, kernel='polynomial', gamma=None, degree=1, coef0=1, kernel_params=None),
#     'RF':         ensemble.RandomForestRegressor(n_estimators=50, min_samples_leaf=50),
}

Run the dataset through the above ML algos using randomized K-fold cross-validation

Output shows $R^{2}$ scores for each regressor in each of the cross-validated trials

In [17]:
n_splits = 4
df_clf = pd.DataFrame(index=sk_regressors.keys(), columns=['CV' + str(i+1) for i in range(n_splits)])
for clf in sk_regressors:
    df_clf.loc[clf, :] = model_selection.cross_val_score(sk_regressors[clf], train, y=df_analysis['YTW'], cv=model_selection.KFold(n_splits=n_splits, shuffle=True, random_state=1), scoring='r2')
df_clf.loc[:, 'AVG'] = df_clf.mean(axis=1)
df_clf.applymap(lambda x: '{:.1f}%'.format(x*100))
Out[17]:
CV1 CV2 CV3 CV4 AVG
ElasticNet 77.6% 79.7% 77.8% 74.7% 77.5%
BRidge 77.7% 79.8% 77.7% 74.8% 77.5%

Select the regressor to use in the analysis/screen and visualize the output

In [34]:
clf = sk_regressors['ElasticNet']
train = pipeline.fit_transform(train)
predicted = clf.fit(train, df_analysis['YTW']).predict(train)
fig, ax = plt.subplots(1, figsize=(15,8))
ax.scatter(predicted, df_analysis['YTW'])
ax.text(0.90, 0.95, r"$R^2$ = {:.2f}".format(metrics.r2_score(df_analysis['YTW'], predicted)), transform=ax.transAxes, fontsize=14,
        verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.7))
ax.set_xlabel("Predicted YTW")
ax.set_ylabel("Observed YTW");

Prepare the results for visualizing in Bokeh

Bokeh doesn't deal well with spaces... Creating a separate dataframe for that and applying other formatting specifics.

In [35]:
results = engineer_features(df_clean, excl_no_data_columns=None, drop_perpetual_maturities=True, ytw_cutoff=25,
                            maturity_cutoff=100, fill_mean_columns=None, country_list=country_list, fillna=np.nan,
                            show_exclusion_metrics=False)

results['YTW_true'] = results['YTW']
results['YTW_predicted'] = clf.predict(train)
results['YTW_diff'] = results['YTW'] - results['YTW_predicted']
results = results.drop_duplicates(subset=bond_id_columns)
leverage_groups = [0, 2, 4, 5, 6, 8, 10]
for i in range(len(leverage_groups)-1):
    results.loc[results["Net Debt/EBITDA"].between(leverage_groups[i],leverage_groups[i+1]),"Leverage"] = "{:.0f}-{:.0f}x".format(leverage_groups[i], leverage_groups[i+1])
results['Next Call Date'] = results['Next Call Date'].apply(lambda x: x.strftime('%Y-%m-%d') if isinstance(x, datetime.datetime) else '-')
results['Next Call Price'] = results['Next Call Price'].apply(lambda x: str(x) if pd.notna(x) else '-')
results['Leverage'] = results['Leverage'].fillna('-')
results['Net Debt/EBITDA'] = results['Net Debt/EBITDA'].apply(lambda x: '{:.2f}x'.format(x) if pd.notna(x) else '-')
results['Net Debt/EBITDA Fwd (Parent)'] = results['Net Debt/EBITDA Fwd (Parent)'].apply(lambda x: '{:.2f}x'.format(x) if pd.notna(x) else '-')
results['Total Debt/Capital (Parent)'] = results['Total Debt/Capital (Parent)'].apply(lambda x: '{:.1f}%'.format(x) if pd.notna(x) else '-')
sorted_results_columns = [y for x in [sorted_results_mapping[grp] for grp in group_categories] for y in x]

results_chart = results.drop("Security ID", axis=1).drop_duplicates()
results_chart.columns = [column.replace(" ", "_") for column in results_chart.columns]
results_chart.columns = [column.replace("/", "_") for column in results_chart.columns]
results_chart.columns = [column.replace("(", "") for column in results_chart.columns]
results_chart.columns = [column.replace(")", "") for column in results_chart.columns]
results_chart.columns = [column.replace("%", "pct") for column in results_chart.columns]

Prepare the results for visualizing in Bokeh

Output a scatterplot of all bond issues with X axis showing 'predicted' YTW and Y axis showing the observed YTW. Size the bubbles according to the Amount Outstanding of the Issue and color each bubble according to the bond's rating.

In [39]:
from bokeh.models import HoverTool
from bokeh.plotting import figure, show, output_file
import holoviews as hv
hv.extension('bokeh')

hover = HoverTool(tooltips=[
    ("Issuer", "@Issuer"),
    ("Currency", "@Currency"),
    ("Coupon", "@Coupon"),
    ("Maturity", "@Maturity_Date{%F}"),
    ("Seniority", "@Seniority"),
    ("Rating", "@Rating"),
    ("Amt. Outstanding", "@Amount_Outstanding m"),
    ("Sector", "@{Sector}"),
    ('Country', '@Country_of_Incorporation'),
    ("LTM Leverage", "@Net_Debt_EBITDA"),
    ("NTM Leverage", "@Net_Debt_EBITDA_Fwd_Parent"),
    ("Total Debt/Capital %", "@Total_Debt_Capital_Parent"),
    ("Call Price", "@Next_Call_Price"),
    ("Call Date", "@Next_Call_Date"),
    ("Price", "@Price"),
    ('Current Yield', "@Current_Yield"),
    ('3m % change', '@Price_Change_3m_pct'),
    ("YTW", "@YTW_true"),
    ("YTW est.", "@YTW_predicted"),
    ],
    formatters={
        'Maturity_Date':'datetime',
    },
)

vdims = ['Issuer', 'Seniority', 'Rating', 'Coupon', 'Sector','Country_of_Incorporation', 'Price', 'Maturity_Date', 'Currency',
         'Net_Debt_EBITDA', 'Net_Debt_EBITDA_Fwd_Parent', 'Total_Debt_Capital_Parent', 'Leverage', 'Amount_Outstanding',
         'Next_Call_Price', 'Next_Call_Date', 'Price_Change_3m_pct', 'Current_Yield', 'YTW_true', 'YTW_predicted']
kdims = ['YTW_predicted', 'YTW_true']

df_plotting = results_chart

ratings = [rating for rating in ratings_sorted if rating in (set(df_plotting.Rating.unique()) - set(['', '-', 'NR']))]
hv_palette = hv.Palette('RdYlGn', samples=len(ratings), reverse=False)
palette = hv.plotting.util.process_cmap(hv_palette, ncolors=len(ratings), categorical=True)
explicit_mapping = dict([(k, v) for k, v in zip(ratings, palette)])

opts = dict(tools=[hover], toolbar='right', show_grid=True, width=1150, height=800, alpha=0.7,
            size_index='Amount_Outstanding', color_index='Rating', cmap=explicit_mapping, show_legend=False, scaling_factor=0.04, framewise=True)

scatter = hv.Scatter(data=df_plotting, kdims=kdims, vdims=vdims, group='All Countries').options(**opts)
chart_list = [scatter]

for sector in sorted(df_plotting.Sector.unique()):
    scatter_sector = hv.Scatter(data=df_plotting[(df_plotting["Sector"] == sector)],
                                kdims=kdims, vdims=vdims, group=sector).options(**opts)
    chart_list.append(scatter_sector)

hv.Layout(chart_list).options(tabs=True, toolbar='right', shared_axes=False).cols(1)
Out[39]:

Export html files by country and generate markup for website rendering

In [38]:
import shutil

CHART_MD_TEMPLATE = '''---
title: {country}
layout: page
chart: output
---

[Back to screen list](../bond_screen.html)

{{% raw %}}
<iframe src="/charts/renders/{chart_name}.html"
    style="max-width = 100%; max-height = 100%"
    sandbox="allow-same-origin allow-scripts"
    width="1200"
    height="1000"
    scrolling="no"
    seamless="seamless"
    frameborder="0">
</iframe>
{{% endraw %}}
'''

try:
    shutil.rmtree('charts')
except Exception as e:
    print('Exception', e)

try:
    shutil.rmtree('../github_pages/mkangrga.github.io/mlprojects/charts')
except Exception as e:
    print('Exception', e)

os.makedirs('charts/renders')

hv.renderer('bokeh').save(hv.Layout(chart_list).options(tabs=True, toolbar='right', shared_axes=False).cols(1),'charts/renders/{}'.format('all_countries'))
with open('charts/' + 'all_countries' + '.md', 'w') as f:
    f.write(CHART_MD_TEMPLATE.format(country='All Countries', chart_name='all_countries'))
    
for country in country_list:
    df_plotting = results_chart[results_chart["Country_of_Incorporation"] == country]

    opts = dict(tools=[hover], toolbar='right', show_grid=True, width=1200, height=800, alpha=0.7,
                size_index='Amount_Outstanding', color_index='Rating', cmap=explicit_mapping, show_legend=False, scaling_factor=0.05, framewise=True)
    
    scatter = hv.Scatter(data=df_plotting, kdims=kdims, vdims=vdims, group=country).options(**opts)
    chart_list = [scatter]
    for sector in sorted(df_plotting.Sector.unique()):
        scatter_sector = hv.Scatter(data=df_plotting[df_plotting["Sector"] == sector],
                                    kdims=kdims, vdims=vdims, group=sector).options(**opts)
        chart_list.append(scatter_sector)
    chart_name = country.replace(' ','_').lower()
    
    hv.renderer('bokeh').save(hv.Layout(chart_list).options(tabs=True, toolbar='right', shared_axes=False).cols(1),'charts/renders/{}'.format(chart_name))
    
    with open('charts/' + chart_name + '.md', 'w') as f:
        f.write(CHART_MD_TEMPLATE.format(country=country, chart_name=chart_name))

shutil.copytree('charts', '../github_pages/mkangrga.github.io/mlprojects/charts')
Exception [WinError 3] The system cannot find the path specified: '../github_pages/mkangrga.github.io/mlprojects/charts'
Out[38]:
'../github_pages/mkangrga.github.io/mlprojects/charts'

Output the chart to an HTML file (for an interactive Bokeh server, see few cells below)

For more interactive visualization, an active Python server is needed. For viewing on machines without python, this can be achieved by running a bokeh server (below) and pointing the browser to the server IP address and port specified in the Bokeh server instance.

In [153]:
hv.renderer('bokeh').save(hv.Layout(chart_list).options(tabs=True, toolbar='right', shared_axes=False).cols(1),'output')
hv.renderer('bokeh').save(hv.Layout(chart_list).options(tabs=True, toolbar='right', shared_axes=False).cols(1), 'output', fmt='png')

Sort and reindex the dataset for a cleaner presentation in Excel and export

In [170]:
def process_index(group_categories, sorted_results_mapping):
    tuple_list = []
    for category in group_categories:
        elements = sorted_results_mapping[category]
        for element in elements:
            tuple_list.append((category, element))
    return tuple_list
results_multiindex = pd.MultiIndex.from_tuples(process_index(group_categories, sorted_results_mapping))
results_present = pd.DataFrame(data=results[sorted_results_columns].values, index=results.index, columns=results_multiindex)
results_present.fillna('-').to_excel('results_output.xlsx')

Output the training dataset to Excel (linear regressors only)

This is meant to assist with analyzing the impact each feature has on the output by manually sensitizing each feature in Excel and immediately seeing the output change.

In [46]:
df_forensics = pd.DataFrame(columns=feature_names)
df_forensics.loc['mean',:] = df_train.mean()
df_forensics.loc['stdev',:] = df_train.std()
df_forensics.loc['coeffs',:] = clf.coef_
df_forensics = df_forensics.append(df_train)
df_forensics.insert(0, 'Intercept', 1)
df_forensics.insert(0, 'Issuer', df_analysis['Issuer'])
df_forensics.loc[['mean', 'stdev'],'Intercept'] = ''
df_forensics.loc['coeffs','Intercept'] = clf.intercept_
n_rows, n_cols = df_forensics.shape

writer = pd.ExcelWriter('fancy.xlsx', engine='xlsxwriter')
df_forensics.to_excel(writer, index=True, sheet_name='train', freeze_panes=(4,1))
workbook = writer.book
train_sheet = writer.sheets['train']
header_format = workbook.add_format({'text_wrap':'true','bold':'true','align':'center'})
header_format.set_border()
for i, col_name in enumerate(df_forensics.columns):
    train_sheet.write(0, i+1, col_name, header_format)
train_sheet.set_row(0, 50)#, cell_format=header_format)
[fmt.set_font_size(8) for fmt in workbook.formats]
scaled_sheet = workbook.add_worksheet('train_scaled')

# Create scaled worksheet, since the regressors expect scaled feature inputs
for row in range(4, n_rows+1):
    for col in range(3, n_cols+1):
        scaled_sheet.write_formula(row, col, '=(train!{}-train!{})/train!{}'.format(xl_rowcol_to_cell(row, col), xl_rowcol_to_cell(1, col), xl_rowcol_to_cell(2, col)))

# Add intercept and the regression formula
for row in range(4, n_rows+1):
    scaled_sheet.write_formula(row, 2, '1')
    train_sheet.write_formula(row, n_cols+2, '=SUMPRODUCT({}:{}, train_scaled!{}:{})'.format(xl_rowcol_to_cell(3,2), xl_rowcol_to_cell(3,n_cols),
                                                                                             xl_rowcol_to_cell(row,2), xl_rowcol_to_cell(row,n_cols)))
train_sheet.write(0, n_cols+2, 'Predicted YTW', header_format)
train_sheet.write(0, n_cols+1, 'Observed YTW', header_format)
for i, value in enumerate(df_analysis['YTW'].values):
    train_sheet.write(i+4, n_cols+1, value)
writer.save()

Start a Bokeh server for interactive visualization outside of the Jupyter Notebook

For more interactive visualization, an active Python server is needed. For viewing on machines without python, this can be achieved by running a bokeh server (below) and pointing the browser to the server IP address and port specified in the Bokeh server instance.

In [295]:
from bokeh.server.server import Server

scatter = hv.Scatter(data=results_chart, kdims=kdims, vdims=vdims, group='Global Bonds').options(**opts)
scatter_us = hv.Scatter(data=results_chart[results_chart["Country_of_Incorporation"] == 'United States'], kdims=kdims, vdims=vdims, group='US Bonds').options(**opts)

# scatter_sector = scatter.groupby(['Country','Sector']).options(**opts)
chart_list = [scatter, scatter_us]
for sector in sorted(results_chart.Sector.unique()):
    scatter_sector = hv.Scatter(data=results_chart[(results_chart["Sector"] == sector) & (results_chart["Country_of_Incorporation"] == 'United States')],
                                kdims=kdims, vdims=vdims, group=sector).groupby('Leverage').options(**opts)
    chart_list.append(scatter_sector)

scatter_leverage = hv.Layout(chart_list).options(tabs=True, toolbar='right', shared_axes=False).cols(1)
renderer = hv.renderer('bokeh')
renderer = renderer.instance(mode='server')
app = renderer.app(scatter_leverage)#, show=True, websocket_origin='localhost:51865')
server = Server({'/': app}, port=51865, allow_websocket_origin=['*'])
server.start()
server.show('/')
INFO:bokeh.server.server:Starting Bokeh server version 0.12.15 (running on Tornado 5.0.1)
WARNING:bokeh.server.util:Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
WARNING:root:LayoutPlot13909: :HoloMap   [Leverage] is empty, skipping subplot.
INFO:tornado.access:200 GET / (::1) 2355.00ms
INFO:tornado.access:101 GET /ws?bokeh-protocol-version=1.0&bokeh-session-id=Q9ZVpy4o6mLagNpQ2oFe8NPCYTqicbngb0I0jGZTZMBL (::1) 1.00ms
INFO:bokeh.server.views.ws:WebSocket connection opened
INFO:bokeh.server.views.ws:ServerConnection created
INFO:bokeh.server.views.ws:WebSocket connection closed: code=1001, reason=None

Stop the Bokeh server

In [296]:
server.stop()

Other interactive visualizations...

In [ ]:
# hv.plotting.list_cmaps()
palette = hv.Palette.colormaps['RdYlGn_r']
palette = hv.plotting.util.process_cmap(palette, ncolors=len(ratings_sorted[:-5]), categorical=True)
explicit_mapping = dict([(k, v) for k, v in zip(ratings_sorted, palette)])
ds = hv.Dataset(data=results_chart[results_chart['Country of Incorporation'] == 'United States'],
               kdims=kdims, vdims=vdims)
opts = dict(tools=[hover], toolbar='right', show_grid=True, width=1000, height=800, alpha=0.6,
            size_index='Amount_Outstanding', color_index='Rating', cmap=explicit_mapping, show_legend=False, scaling_factor=0.05, framewise=True)
hm = ds.to(hv.Points, kdims=kdims, vdims=vdims, group='US Bonds').options(**opts)
In [90]:
ds.to(hv.BoxWhisker, 'Rating', 'Net_Debt_EBITDA', groupby=['Sector']).options(width=800, height=800, xrotation=30, box_fill_color=hv.Palette('Category20'), framewise=True)
Out[90]:
In [301]:
opts = dict(tools=[hover], toolbar='right', show_grid=True, width=1000, height=800, alpha=0.8,
            size_index='Amount_Outstanding',color_index='Rating', cmap=explicit_mapping, show_legend=False, scaling_factor=0.05, framewise=True)

scatter_sector = hv.Points(data=results_chart[(results_chart["Country_of_Incorporation"] == 'United States')],
                            kdims=kdims, vdims=vdims).groupby(['Sector', 'Leverage']).options(**opts)
scatter_sector
Out[301]:

Clustering Analysis

If certain bonds tend to cluster together, run a clustering DBSCAN algo to identify clusters and examine the underlying outliers

In [111]:
from sklearn.cluster import DBSCAN
from sklearn import metrics

X = results[results['Country'] == 'United States'][['YTW_true', 'YTW_predicted']].values
db = DBSCAN(eps=0.66, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

print('Estimated number of clusters: %d' % n_clusters_)

# #############################################################################
# Plot result
import matplotlib.pyplot as plt

# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
          for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
    if k == -1:
        # Black used for noise.
        col = [0, 0, 0, 1]

    class_member_mask = (labels == k)

    xy = X[class_member_mask & core_samples_mask]
    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
             markeredgecolor='k', markersize=14)

    xy = X[class_member_mask & ~core_samples_mask]
    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
             markeredgecolor='k', markersize=6)

plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
Estimated number of clusters: 2
In [113]:
results_us = results[results['Country'] == 'United States'].copy()
results_us['Label'] = labels
results_us.to_excel('outliers.xlsx')

FacetGrid scatterplot of YTW vs Leverage by Sector

In [223]:
sns.lmplot(x='Net Debt/EBITDA', y='YTW', col='Sector', hue='Rating', palette=sns.color_palette("coolwarm", n_colors=len(ratings_sorted)), hue_order=ratings_sorted,
           data=df_analysis[(df_analysis['Rating'] != '-') & (df_analysis['Rating'] != '') & (df_analysis['Price'] > 80)], sharex=False, sharey=False, fit_reg=False, col_wrap=5);

FacetGrid scatterplot of YTW vs Price by Seniority

In [224]:
sns.lmplot(x='Price', y='YTW', col='Seniority', hue='Rating', palette=sns.color_palette("coolwarm", n_colors=len(ratings_sorted)), hue_order=ratings_sorted,
           data=df_analysis[(df_analysis['Rating'] != '-') & (df_analysis['Rating'] != '') & (df_analysis['Price'] > 80)], sharex=False, sharey=False, fit_reg=False, col_wrap=5);

Scikit-Learn regressor selection

Run the regression using all linear regressors in Scikit-Learn. Some regressors are unsuitable for this purpose or take a very long time - screen those out by running the regression on a limited dataset

In [304]:
# run regression using all linear regressors with default settings
from sklearn.utils.testing import all_estimators
from sklearn import base
import time
import warnings

linear_regressors = {}
for name, class_ in all_estimators(type_filter='regressor'):
    if issubclass(class_, linear_model.base.LinearModel):
        linear_regressors[name] = class_()
del linear_regressors['TheilSenRegressor']

time_dict = {}
for clf in linear_regressors:
    start_time = time.time()
    try:
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')
            model_selection.cross_val_score(linear_regressors[clf], train[:500], y=df_analysis['YTW'][:500], cv=model_selection.KFold(shuffle=True, random_state=1), scoring='r2')
        time_dict[clf] = time.time() - start_time
    except:
        time_dict[clf] = 100
        
fig, ax = plt.subplots(1)
ax.bar(time_dict.keys(), time_dict.values())
ax.set_ylim([0, 0.5])
ax.set_xticklabels(time_dict.keys(), rotation=45, horizontalalignment='right');
ax.set_ylabel('Time to complete');

Run the regression using selected regressors on the full dataset

Using default hyperparameters - this is not optimal and should be done using BayesianSearchCV, but a decent start.

In [320]:
regressor_scores = {}
for clf in linear_regressors:
    if time_dict[clf] < 0.2:
        regressor_scores[clf] = model_selection.cross_val_score(linear_regressors[clf], train, y=df_analysis['YTW'], cv=model_selection.KFold(shuffle=True, random_state=1), scoring='r2')
fig, ax = plt.subplots(1)
df_regressor_scores = pd.DataFrame(regressor_scores)
ax.bar(df_regressor_scores.mean().index, df_regressor_scores.mean().values)
ax.set_ylim([-0.5, 1])
ax.set_xticklabels(df_regressor_scores.mean().index, rotation=45, horizontalalignment='right');
In [616]:
df_analysis.groupby(['Parent Name', 'Seniority'])[['Net Debt (Parent)', 'Amount Outstanding']].sum().sort_index().unstack().to_excel('a.xlsx')